Blind source separation (BSS) and sound activity detection (SAD) from a sound source mixture with minimum prior\r\ninformation are two major requirements for computational auditory scene analysis that recognizes auditory events in\r\nmany environments. In daily environments, BSS suffers from many problems such as reverberation, a permutation\r\nproblem in frequency-domain processing, and uncertainty about the number of sources in the observed mixture.\r\nWhile many conventional BSS methods resort to a cascaded combination of subprocesses, e.g., frequency-wise\r\nseparation and permutation resolution, to overcome these problems, their outcomes may be affected by the worst\r\nsubprocess. Our aim is to develop a unified framework to cope with these problems. Our method, called permutationfree\r\ninfinite sparse factor analysis (PF-ISFA), is based on a nonparametric Bayesian framework that enables inference\r\nwithout a pre-determined number of sources. It solves BSS, SAD and the permutation problem at the same time. Our\r\nmethod has two key ideas: unified source activities for all the frequency bins and the activation probabilities of all the\r\nfrequency bins of all the sources. Experiments were carried out to evaluate the separation performance and the SAD\r\nperformance under four reverberant conditions. For separation performance in the BSS EVAL criteria, our method\r\noutperformed conventional complex ISFA under all conditions. For SAD performance, our method outperformed the\r\nconventional method by 5.9ââ?¬â??0.5% in F-measure under the condition RT20 = 30ââ?¬â??600 [ms], respectively.
Loading....